Fast High-Dimensional Data Search in Incomplete Databases

نویسندگان

Beng Chin Ooi

Cheng Hian Goh

Kian-Lee Tan

چکیده

We propose and evaluate two indexing schemes for improving the efficiency of data retrieval in high-dimensional databases that are incomplete. These schemes are novel in that the search keys may contain missing attribute values. The first is a multi-dimensional index structure, called the Bitstring-augmented R-tree (BR-tree), whereas the second comprises a family of multiple one-dimensional one-attribute (MOSAIC) indexes. Our results show that both schemes can be superior over exhaustive search. Experimental results suggest that BRtrees have lower update and storage costs and are able to support range queries more efficiently under most circumstances, when compared to the MOSAIC indexing scheme. However, contrary to conventional wisdom, the MOSAIC structure outperforms the BR-tree in retrieval time for point queries, as well as in range queries over incomplete databases for dimension-unrestricted data distributions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Near Neighbor Search in High-Dimensional Binary Data

Numerous applications in search, databases, machine learning, and computer vision, can benefit from efficient algorithms for near neighbor search. This paper proposes a simple framework for fast near neighbor search in high-dimensional binary data, which are common in practice (e.g., text). We develop a very simple and effective strategy for sub-linear time near neighbor search, by creating has...

متن کامل

MLR-Index: An Index Structure for Fast and Scalable Similarity Search in High Dimensions

High-dimensional indexing has been very popularly used for performing similarity search over various data types such as multimedia (audio/image/video) databases, document collections, time-series data, sensor data and scientific databases. Because of the curse of dimensionality, it is already known that well-known data structures like kd-tree, R-tree, and M-tree suffer in their performance over...

متن کامل

RIVA: Indexing and Visualization of High-Dimensional Data Via Dimension Reorderings

We propose a new representation for high-dimensional data that can prove very effective for visualization, nearest neighbor (NN) and range searches. It has been unequivocally demonstrated that existing index structures cannot facilitate efficient search in high-dimensional spaces. We show that a transformation from points to sequences can potentially diminish the negative effects of the dimensi...

متن کامل

Fast Nearest-Neighbor Search Algorithms Based on High-Multidimensional Data

Similarity search in multimedia databases requires an efficient support of nearest-neighbor search on a large set of high-dimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearest-neighbor search are not efficient in higher dimensions. In our new approach, we therefore pre-compute the result of any nearest-neighbor s...

متن کامل

Detecting High-Dimensional Outliers: the New Task, Algorithms and Performance

Outlier detection is a fundamental step in knowledge discovery in databases. With the increasing number of high-dimensional databases, existing outlier detection algorithms that work only in the context of full space are unable to effectively screen out informative outliers. This is because majority of these outliers exists only in subspaces. In this paper, we identify a new outlier detection t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

Fast High-Dimensional Data Search in Incomplete Databases

نویسندگان

چکیده

منابع مشابه

Fast Near Neighbor Search in High-Dimensional Binary Data

MLR-Index: An Index Structure for Fast and Scalable Similarity Search in High Dimensions

RIVA: Indexing and Visualization of High-Dimensional Data Via Dimension Reorderings

Fast Nearest-Neighbor Search Algorithms Based on High-Multidimensional Data

Detecting High-Dimensional Outliers: the New Task, Algorithms and Performance

عنوان ژورنال:

اشتراک گذاری